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A TWO-STATE MIXED HIDDEN MARKOV MODEL FOR RISKY 
TEENAGE DRIVING BEHAVIOR 

By John C. Jackson*, Paul S. Albert+J and Zhiwei Zhang'^’^ 
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Institute of Child Health and Human Development^ 

This paper proposes a joint model for longitudinal binary and 
count outcomes. We apply the model to a unique longitudinal study 
of teen driving where risky driving behavior and the occurrence of 
crashes or near crashes are measured prospectively over the first 18 
months of licensure. Of scientific interest is relating the two processes 
and predicting crash and near crash outcomes. We propose a two- 
state mixed hidden Markov model whereby the hidden state charac¬ 
terizes the mean for the joint longitudinal crash/near crash outcomes 
and elevated g-force events which are a proxy for risky driving. Het¬ 
erogeneity is introduced in both the conditional model for the count 
outcomes and the hidden process using a shared random effect. An 
estimation procedure is presented using the forward-backward algo¬ 
rithm along with adaptive Gaussian quadrature to perform numeri¬ 
cal integration. The estimation procedure readily yields hidden state 
probabilities as well as providing for a broad class of predictors. 

1. Introduction. The Naturalistic Teenage Driving Study (NTDS), spon¬ 
sored by the National Institute of Child Health and Human Development 
(NICHD), was conducted to evaluate the effects of experience on teen driv¬ 
ing performance under various driving conditions. It is the first naturalis¬ 
tic study of teenage driving and has given numerous insights to the risky 
driving behavior of newly licensed teens that includes evidence that risky 
driving does not decline with experience as discussed by Simons-Morton 
et al. (2011). During the study 42 newly licensed drivers were followed over 
the first 18 months after obtaining a license. The participants were paid for 
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their participation, and there were no dropouts. Driving took place primar¬ 
ily in southern Virginia among small cities and rural areas. For each trip, 
various kinematic measures were captured. A lateral accelerometer recorded 
driver steering control by measuring g-forces the automobile experiences. 
These recordings provide two kinematic measures: lateral acceleration and 
lateral deceleration. A longitudinal accelerometer captured driving behavior 
along a straight path and records accelerations or decelerations. Another 
measure for steering control is the vehicle’s yaw rate, which is the angular 
deviation of the vehicle’s longitudinal axis from the direction of the auto¬ 
mobile’s path. Each of these kinematic measures was recorded as count data 
as they crossed specified thresholds that represent normal driving behav¬ 
ior. Crash and near crash outcomes were recorded in two ways. First, the 
driver of each vehicle had the ability to self report these events. Second, 
video cameras provided front and rear views from the car during each trip. 
Trained technicians analyzed each trip the driver took using the video and 
determination of crash/near crash events made. Table 1 shows the aggre¬ 
gate data for the driving study. More information on the study can be found 
at http: //www. vtti . vt. edu. Our interests are the prediction of crash and 
near crash events from longitudinal risky driving behavior. Crash or near 
crash outcomes are our binary outcome of interest, while excessive g-force 
events are our proxy for risky driving. It is likely that crash/near crash out¬ 
comes are best described by some unobserved or latent quality like inherent 
driving ability. Previously, Jackson et al. (2013) analyzed the driving data 
using a latent construct where the previously observed kinematic measures 
describe the hidden state and the hidden state describes the CNC outcome. 
Our approach here characterizes the joint distribution of crash/near crash 
and kinematic outcomes using a mixed hidden Markov model where both 
outcomes contribute to the calculation of the hidden state probabilities. 

There is a previous literature on mixed hidden Markov models. Discrete¬ 
time mixed Markov latent class models are introduced by Langeheine and 


Table 1 

Kinematic measures and correlation with CNCs, naturalistic teenage driving study, 
* correlation computed between the CNC and elevated g-force event rates 


Category 

Gravitation force 

Frequency 

% Total events 

Correlation with CNCs* 

Rapid starts 

>0.35 

8747 

39.6 

0.28 

Hard stops 

< -0.45 

4228 

19.1 

0.76 

Hard left turns 

< -0.05 

4563 

20.6 

0.53 

Hard right turns 

>0.05 

3185 

14.4 

0.62 

Yaw 

6° in 3 seconds 

1367 

6.2 

0.46 

Total 


22,090 

100 

0.60 
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van de Pol (1994). A general framework for implementing random effects 
in the hidden Markov model is discussed by Altman (2007). In this work, 
the author presented a general framework for a mixed hidden Markov model 
with a single outcome. The mixed hidden Markov model presented by Alt¬ 
man unihes existing hidden Markov models for multiple processes, which 
provides several advantages. The modeling of multiple processes simultane¬ 
ously permits the estimation of population-level effects as well as allowing 
great flexibility in modeling the correlation structure because they relax the 
assumption that observations are independent given the hidden states. There 
are a variety of methods available for estimation of parameters in mixed hid¬ 
den Markov models. Altman (2007) performed estimation by evaluating the 
likelihood as a product of matrices and performing numerical integration 
via Gaussian quadrature. A quasi-Newton method is used for maximum 
likelihood estimation. Bartolucci and Pennoni (2007) extend the latent class 
model for the analysis of capture-recapture data, which takes into account 
the effect of past capture outcomes on future capture events. Their model 
allows for heterogeneity among subjects using multiple classes in the latent 
state. Scott (2002) introduces Bayesian methods for hidden Markov models 
which was used as a framework to analyze alcoholism treatment trial data 
[Shirley et al. (2010)]. Bartolucci, Lupparelli and Montanari (2009) use a 
fixed effects model to evaluate the performance of nursing homes using a 
hidden Markov model with time-varying covariates in the hidden process. 
Maruotti (2011) discusses mixed hidden Markov models and their estimation 
using the expectation-maximization algorithm and leveraging the forward 
and backward probabilities given by the forward-backward algorithm and 
partitioning the complete-data-log-likelihood into sub-problems. Maruotti 
and Rocci (2012) proposed a mixed non-homogeneous hidden Markov model 
for a categorical data. 

We present a model that extends the work of Altman (2007) in sev¬ 
eral ways. First, we allow the hidden state to jointly model longitudinal 
binary and count data, which in the context of the driving study repre¬ 
sent crash/near crash events and kinematic events, respectively. We also 
introduce an alternative method to evaluate the likelihood by first using 
the forward-backward algorithm [Baum et al. (1970)] followed by perform¬ 
ing integration using adaptive Gaussian quadrature. Implementation of the 
forward-backward algorithm allows for easy recovery of the posterior hidden 
state probabilities, while the use of adaptive Gaussian quadrature allevi¬ 
ates bias of parameter estimates in the hidden process [Altman (2007)]. 
Understanding the nature of the hidden state at different time points was of 
particular interest to us in our application to teenage driving, and our esti¬ 
mation procedure yields an efficient way to evaluate the posterior probability 
of state occupancy. 
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In this paper we first introduce a joint model for crash/near crash out¬ 
comes and kinematic events which allows the mean of each these to change 
according to a two-state Markov chain. We introduce heterogeneity in the 
hidden process as well as the conditional model for the kinematic events via 
a shared random effect. We then discuss an estimation procedure whereby 
the likelihood is maximized directly and estimation of the hidden states is 
readily available by incorporating the forward-backward algorithm [Baum 
et al. (1970)] in evaluating individual likelihoods. We apply our model to the 
NTDS data and show that these driving kinematic data and CNC events are 
closely tied via the latent state. We use our model results to form a predictor 
for future CNC events based on previously observed kinematic data. 


2. Methods. Here we present the joint model for longitudinal binary and 
count outcomes using two hidden states as well as the estimation procedure 
for model parameters. 


2.1. The model. Let bj = {bii,bi 2 ,..., binj be an unobserved binary ran¬ 
dom vector whose elements follow a two-state Markov chain (state “0” rep¬ 
resents a good driving state and state “1” represents a poor driving state) 
with unknown transition probabilities poi,Pio and initial probability distri¬ 
bution rg. We model the crash/near crash outcome Yij, where i represents 
an individual and j the month since licensure, using the logistic regression 
model shown in (1): 


( 1 ) 


Yij ~ Bernoulli(7rjj), 

logit(7rij) = log(mij) -L oq -L aibij -L a2Ui, 


where the log(mjj) is an offset to account for the miles driven during a par¬ 
ticular month. Treatment of the CNC outcome as binary is not problematic 
since more than 98% of the monthly observations had one or fewer CNCs 
observed. Although the log link is ideal for count data, it is a reasonable 
correction for the miles driven for the logit link when the risk of a crash is 
low. An alternative parameterization would be to include the miles driven 
as a covariate in the model. The parameter ai assesses the odds-ratio of a 
crash or near crash event when in the poor versus good driving state; this 
odds ratio is simply , while 02 reflects unaccounted for covariates beyond 
the hidden state. The Xij are the sum of the observed elevated g-force count 
data and this model incorporates heterogeneity among subjects introducing 
the random effect in the mean structure shown in (2): 


(2) 


Poisson (/ijj), 

\og{pLij) = log(mij) + I3q + Pibij + P2tj + IdzUi, 
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where Ui is the random effect with Gaussian distribution and tj reflects the 
month of observation since licensure for a particular individual (note that 
these observations are equally spaced and tj was not statistically signifi¬ 
cant when included in the CNC process, hence its omission in this part of 
the model). Here the parameter in (2), along with the variance of the 
random effect distribution, accounts for any variation not explained by the 
other terms included in the model and induces a correlation between out¬ 
comes. Next, we assume that is a Markov chain and that bij\ui 

is independent of bit\ui for j ^ t. The transition probabilities for the Markov 
chain must he between 0 and 1 , and the sum of transitioning from one state 
to either state must be 1. Hence, the transition probabilities are modeled as 

Poi(Mi): logit(Pr( 6 ij = = 0|ui)) = 701 -F + Ui, 

( 3 ) 

pio(Mi): logit(Pr( 6 ij = 0|6ij_i = l\ui)) = 710 + S 2 yi,j-i + 6*Ui, 

where the parameter 6* in (3) characterizes the degree of correlation be¬ 
tween transition probabilities among individuals. Two different types of cor¬ 
relations are described by 6. If <5* < 0, then this implies individuals have a 
tendency to remain in either state. If 5* > 0, then this implies some indi¬ 
viduals exhibit a tendency to transition more between states than others. 
This approach to describe the transitions in similar to that presented by 
Albert and Follmann (2007). Introducing random effects in this manner is 
of great beneht computationally, however, there is a downside risk in that 
this implementation assumes the processes are highly correlated. This is es¬ 
pecially important for the hidden process where if the correlation between 
transition probabilities is not very strong, biased estimates will result. We 
present these hndings in our simulation section. 


2.2. Estimation. Letting T represent all parameters included in the model 
discussed above, the likelihood for the joint model is 


^('i^;y,x) 


/(y|(b, u), T)ff(x|(b|u), T)/i(u; du 


( 4 ) 


N Ui 


E nn f{yij\{bij\u),'l))g{xij\{bij\u),^) 

b {i=lj=2 


N 


rii 






|u U(u;T)du, 


U=i j=3 ) 

where the summation associated with b represents all possible state 
quences for an individual and the initial state probabilities are given 


se- 

by 
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{rjo} and may include a subject-specific random effect. In (4) we assume 
the crash/near crash and kinematic event data are conditionally indepen¬ 
dent. We also assume the {ui} are independent and identically distributed 
and the observations for any driver are independent given the random effect 
Ui and the sequence of hidden states. Given these assumptions, the likeli¬ 
hood given in (4) simplifies to a product of one-dimensional integrals shown 
in (5): 


iCSiy.x) 

=n/lE riof{yi2\{bi2\ui),^)g{xi2\{bi2\ui),^) 

1 = 1“'“i [ bi 

Ui 

X YlPbij_ubij\nJ{yij\{bij\Ui),'i>) 
i=3 


X g{xij\{bij\ui),'i/) 

The different roles the random effects perform in this joint model are 
of particular interest. The purpose of the inclusion of the random effect 
in the conditional model for the kinematic data {{xi\{bi\ui))} provides a 
relaxation of the assumption that the observations for an individual are 
conditionally independent given the hidden states {bj} as well as accounting 
for overdispersion for the kinematic event data. The inclusion of the random 
effect in the transition probabilities allows the transition probabilities to vary 
across individuals, inducing a correlation between transition probabilities 
that induces a relationship between the kinematic events and CNC processes. 
Further, the random effect provides a departure from the assumption that 
the transition process follows a Markov chain. One could formulate a reduced 
model that includes a random effect only in the hidden process, in the hidden 
process and one or both observed processes, or only in observed processes. 

Several possibilities exist for maximizing the likelihood given by (5). Two 
common approaches are the Monte Carlo expectation maximization algo¬ 
rithm introduced by Wei and Tanner (1990) and the simulated maximum 
likelihood methods discussed by McCulloch (1997). In Section 2.3, we pro¬ 
pose a different method for parameter estimation that does not rely on 
Monte Carlo methods which are difficult to implement and monitor for con¬ 
vergence. Our method utilizes the forward-backward algorithm to evaluate 
the individual likelihoods conditional on the random effect. As we will show, 
incorporation of this algorithm provides a simpler means of computing the 
posterior probability of the hidden state at any time point than MCEM. 
Further, this method provides a straightforward approach for likelihood and 
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variance evaluation. This approach has the added benefit of producing the 
estimated variance-covariance matrix for parameter estimates as determined 
by the inverse of the observed information matrix. 

2.3. Maximizing the likelihood using an implementation of the forward- 
backward algorithm. In maximizing the likelihood given by (5), our ap¬ 
proach first evaluates the portion of the likelihood described by the observed 
data given the random effect and hidden states shown here: 

'^riof{yi2\{hi2\ui),^)g{xi2\{hi2\ui),'^) 
bi 

rii 

X n \uj{yij\{hj\ui),^)g{xij\{bij\ui),^) 
j=3 

using the forward-backward algorithm and subsequent numerical integration 
over the random effect via adaptive Gaussian quadrature. We then use a 
quasi-Newton method to maximize this result. The alteration of the forward- 
backward algorithm to accommodate joint outcomes is described as follows. 
Here, let the vectors Yjf. = (Yik, ■ • ■ with realized values {yik, ■ ■ ■ jVij)' 

and 'K^f. = ..., Xij)' with realized values {xik, ■. ■, xij)'. Decompose the 

joint probability for an individual as follows: 

Pr(6ij =m,Yi = yi,Xi = Xi\ui) 

(7) = Pr(IX^2 = yi2^ ^i2 = = Mui) 

X Pr(yT+i = vlj+i\Yl = yj^, {hj = m\ui)) 

X Pr(XT+i = xT+ilX /2 = xl^, (bij = m\ui)) 

= P^iYi2 = ^i2 = = m\ui) Pr(yT+i = ylj+i\{hj = m\ui)) 

X Pr(XT+i = xY^f\{bij = m\ui)) 

= aimij)zim{j) for m = 0,1, 

where 0*^0) and Zim{j) are referred to as the forward and backward quan¬ 
tities, respectively, and are 

aim{j) = Pr(i;2 = yh^^h = for j = 2,..., m, 

aim(l) =Pr((6ii = m\ui))Fr:{Yi 2 = yi 2 \{bi 2 =m\ui)) 

X Fr{Xi 2 = Xi2\{bi2 =m\ui)), 

Zim{j) = Pl'(yjj +1 = = X^ j_^_.^,bij = m\ui) 

for j = l,...,(ni-l). 


Zim{ni) = 1 for all i. 
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The aim{j) and Zim{j) are computed recursively in j by using the following: 
1 

aim{j) = X] = yi2^^i2 = Xi2^Kj-l = ^ «i) 

1=0 

1 

= = yi2\^i2^ = = i\Ui)pim\ui 

1=0 

X Pr(yij = yij,Xij = Xij\{bij = m\ui)) 

1 

= '^aii{j - l)Pim|ui Pr(Ty = yij,Xij = Xij\{bij = m\ui)) 

1=0 

1 

= - l)vim\ui ^A^ij=yij\{hj=m\ui)) 

1=0 

X Y’i{Xij =Xij\{bij =m\ui)) 


and 

1 

Zim{j) = Pr(^"i+i = 6 ij+i = /, {bij = m\ui)) 

1=0 

1 

= E P'(h ?+1 = S'.! W’ 5+1 = <'+il( 6 i;+. = i|«i))p„,|.. 

1=0 

X Pr(y^ j_|_i — yj j_|_i, Al jj_|_i — |6jj-|-i — ^|'Ui) 

1 

= Pr(yjj+i = yjj+i,Xij+i = Xij+i\{bij+i = l\ui)) 

1=0 
1 

= Y + l)PmZ|«i PrC^Li+l = yj,j+ll(^ij+l = %*)) 

1=0 

X Pr(ALjj_|_i —Xjj-|_iI(6jj-|_i — l\ui)). 


For any individual, the likelihood conditional on the random effect may 
be expressed as a function of the forward probabilities, so for a two-state 
Markov chain the conditional likelihood for an individual is 

Lj|„. =Pr(Yi =yi,Xj =Xj|rtj) 

1 

= ^Pr(Yi =yi,Xi =Xj, 6 „. =l\ui) 

1=0 
1 

= ^ajz(nj|T,rii), 

1=0 


(8) 
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where aio{ni\'^,Ui) and an^riil'if ,Ui) are the forward probabilities for subject 
i associated with states 0 and 1, respectively, evaluated at the last observa¬ 
tion of the subject’s observation sequence n*. The marginal likelihood for an 
individual can now be found by integrating with respect to the random effect 

(9) Li= {aio{ni\^,Ui)+ aii{ni\^,ui)}h{ui)dui 

J Ui 

and the complete likelihood can be expressed as a product of individual 
likelihoods. 


2.4. Numerical integration. Adaptive Gaussian quadrature can be used 
to integrate (9). This technique is essential to obtaining accurate parameter 
estimates, as the integrand is sharply “peaked” at different values depend¬ 
ing on the observed measurements of the individual. Applying the results 
described in Liu and Pierce (1994) to the joint hidden Markov model, nu¬ 
merical integration of (9) is achieved by considering the distribution of the 
random effects to be N(f),6‘^). The procedure for obtaining maximum like¬ 
lihood estimates for model parameters are shown in Table 2. 


2.5. Estimation of posterior hidden state probabilities. The forward- 
backward implementation in evaluating the likelihood is not only efficient 
(the number of operations to compute the likelihood conditional on the ran¬ 
dom effect is of linear order as the observation sequence increases), but it also 
provides a mechanism for recovering information about the hidden states. 
By leveraging the forward and backward probabilities, we can compute the 
hidden posterior state probabilities {bij}: 

E{bij\yi,Xi) = E^.\y.^^.{E{bij\yi,Xi,Ui)} 

( 10 ) = / {Pribij = l\{yi,Xi,Ui))}hu^\(yi,xi)dui 


_ f j E^jbjj = 1, (yi,Xj),Ui) 


H 1 


Pr(yi,Xi) 


dui, 


Table 2 

Procedure for obtaining maximum likelihood estimates for the joint mixed hidden Markov 

model 

(1) Select initial parameter estimates p°. 

(2) Compute the set of adaptive quadrature points for each individual qi given the 
current parameter estimates p™'. 

(3) Maximize the likelihood obtained using the forward-backward algorithm and adap¬ 
tive quadrature via qi £Q using the quasi-Newton method. 

(4) Update parameter estimates 

(5) Repeats steps (2)-(4) until parameters converge. 
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since 


Pr:{bij = 1, (yi,Xj),iij) = Pr(6jj = 1, (yi,Xi)|ui)/i(iij) 

= aii{j)zii{j)h{ui), 


equation ( 10 ) can be expressed as 


( 11 ) 



Pijbjj = 1, (yi,Xj),Uj) 
Pr(yAXj) 


dui 



aii{j)zii{j)h{ui) 

+aii(j))h(ui)dui} 


dui- 


Evaluation of (11) is accomplished via adaptive Gaussian quadrature as 
outlined in the earlier section and the quantities of interest in ( 11 ) are readily 
available after running the forward-backward algorithm. 

Using a shared random effect is also attractive in that it is computation¬ 
ally more efficient than incorporating separate random effects for the count 
outcome and transition probabilities. Generally speaking, once the number 
of random effects exceeds three or four (depending on the type of quadra¬ 
ture being used and the number of nodes included for each integration), 
direct maximization is no longer a computationally efficient method and 
Monte Carlo expectation maximization (MCEM) is an appealing alterna¬ 
tive approach. In accounting for heterogeneity with a single random effect, 
we eliminate the need for MCEM. 


3. Simulation of the mixed model. We performed a simulation to investi¬ 
gate the performance of parameter estimation using the proposed approach. 
Under the shared random effect parameterization, we use the model in (12) 
for the simulation 


logit(7rij) = ao + + a2Ui, 

log(/Xij) = /3o -L Idibij -L /32Uj, 

(12) logit{Pr( 6 ij = l\bij_i = 0)} = 701 -L hyi,j-i + Ui, 

logit{Pr( 6 ij = l\bij_i = 0)} = 710 + 62 yi,j-i + d*Ui, 
logit( 6 ii = 1) = 7 ri, 

where Ui iV( 0 ,e^). 

The simulations were conducted with 20 observations on 60 subjects. Us¬ 
ing a 1.86-GHz Intel Core 2 Duo processor, the htting of the shared model 
took less than 3 minutes on average. The simulation results (1000 simula¬ 
tions) are shown in Table 3. Parameter estimates obtained using adaptive 
Gaussian quadrature with hve and eleven points, respectively, are presented 
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Table 3 

Parameter estimates for mixed hidden Markov model 1000 simulations (60 individuals, 
20 observations) using Q = 5 and Q = 11 quadrature points 


Parameter 

9 


Q = 5 



Q = ll 


0 

Oad 

o-e 

0 

dsd 

o-e 

QO 

-1.0 

-1.01 

0.12 

0.12 

-1.01 

0.11 

0.11 

ai 

2.0 

2.03 

0.18 

0.19 

2.03 

0.18 

0.17 

0.2 

1.5 

1.52 

0.41 

0.42 

1.51 

0.42 

0.44 

/3o 

-1.0 

-1.01 

0.10 

0.10 

-1.00 

0.09 

0.06 

01 

2.0 

2.01 

0.12 

0.09 

2.02 

0.11 

0.09 

02 

0.25 

0.255 

0.06 

0.07 

0.251 

0.06 

0.04 

701 

-0.62 

-0.61 

0.19 

0.18 

-0.62 

0.19 

0.17 

710 

0.4 

0.42 

0.34 

0.32 

0.42 

0.30 

0.30 

A 

0.0 

-0.03 

0.16 

0.15 

-0.03 

0.16 

0.16 

<5* 

2.00 

2.15 

0.45 

0.41 

2.02 

0.43 

0.42 

<5i 

1.00 

1.08 

0.22 

0.21 

1.02 

0.20 

0.21 

52 

3.00 

2.97 

0.41 

0.43 

2.97 

0.44 

0.42 

'^il 

-0.8 

-0.82 

0.06 

0.05 

-0.81 

0.05 

0.05 


along with true parameter value (9) and mean (6), the sample standard 
deviation for the parameter estimates 0sd, and the average asymptotic stan¬ 
dard errors a^. In performing the estimation using five quadrature points, 
parameter estimation was quite accurate with the exception of the coeffi¬ 
cient of the random effect 6 * in the 0-1 transition with an average estimated 
value of 2.15 compared to the actual value of 2.0. Other parameters display 
very little bias. The bias for (5* virtually disappears when performing the 
integration via adaptive Gaussian quadrature using ten points where the 
average estimated value was (5* = 2.02. These results were unchanged when 
evaluating for possible effects due to the total number of subjects or obser¬ 
vations (varying n and I). Additionally, the average asymptotic standard 
errors agree quite closely to the sample standard deviations for all model 
parameters. Similar results hold for the case where random effects are only 
included in the hidden process. We used different starting values to examine 
the sensitivity of estimation to initial values, and our proposed algorithm 
was insensitive to the selection of these values. Simulation results provide 
support that the complexity of the model does not inhibit parameter es¬ 
timation, rendering discovery of heterogeneity in the observed and hidden 
processes as a nice byproduct for the model. 

While performance of the estimation procedure for the mixed hidden 
Markov model in (12) was quite good, this model formulation assumes near 
perfect correlation between the random effects. To examine the robustness 
of the estimation procedure to this assumption, we evaluated model perfor- 
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Table 4 

Parameter estimates for the hidden process when the true underlying random effects 
distribution is correlated. 1000 simulations (60 individuals, 20 observations) 


Parameter 

True value 

p = 1 

p = 0.95 

p = 0.9 

p = 0.8 

p = 0.7 

p = 0.6 

710 

-0.62 

-0.620 

-0.628 

-0.640 

-0.67 

-0.69 

-0.72 

701 

0.4 

0.401 

-0.399 

0.398 

0.32 

0.28 

0.26 

(5 

2.0 

2.00 

2.11 

2.28 

2.44 

2.61 

2.68 


mance in the case where there are correlated random effects in the hidden 
process. For this case, data was simulated for the transition probabilities 
using (13): 


logit{Pr( 6 jj = l\bij_i = 0)} = 701 + ui, 

(13) 

logit{Pr( 6 jj = l\bij_i = 0)} = 710 + U 2 , 

where (mi,M 2 ) follow a bivariate normal distribution BVN{0, Xl). Our model 
was then fit to the simulated data using the parameterization in (14): 


logit{Pr( 6 jj = l\bij-i = 0)} = 701 + u*, 

(14) 

logit{Pr(6ij = l\bij-i = 0)} = 710 + Sui. 

The results for this simulation are shown in Table 4. For moderate depar¬ 
tures from perfect correlation, biased estimates result in the hidden process. 
Thus, we recommend first considering correlated random effects in the hid¬ 
den process before use of the shared random effects model. In the case of 
our application, the transition process exhibited very high correlation, so 
we proceed in the analysis with our estimation procedure. Details for es¬ 
timation using bivariate adaptive Gaussian quadrature are shown in the 
supplementary material [Jackson, Albert and Zhang (2015)]. 


4. Results. The two-state mixed hidden Markov model presented in the 
previous sections was applied to the NTDS data. Table 5 displays the pa¬ 
rameter estimates and associated standard errors. The initial probability 
distribution for the hidden states was common to all individuals in the study 
and modeled using logit (ri) = tti and the random effects distribution was 
N(0,e^). Several models were compared based on the relative goodness-of- 
fit measure, AIC. Model excursions included evaluating the suitability of a 
three-state hidden Markov model without random effects {AIC = 3563.06) 
(model shown in the supplementary material [Jackson, Albert and Zhang 
(2015)1), the two-state model without random effects {AIC = 3548.97), and 
the two-state model with random effects {AIC = 3492.73). 
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Table 5 

Parameter estimates for the mixed hidden Markov model as 
applied to the NTDS data 


Parameters 

Estimate 

Std Err 

QO 

-7.48 

0.14 

ai 

1.49 

0.25 

Cx.2 

0.03 

0.02 

Po 

-5.97 

0.13 

/3i 

1.31 

0.06 

P2 

0.007 

0.004 


1.10 

0.04 

A 

-0.18 

0.33 

6* 

1.25 

0.32 

710 

-2.13 

0.35 

701 

-3.47 

0.28 

5i 

1.75 

0.24 

52 

-2.17 

0.53 

m 

-0.83 

0.28 


The fixed effects model provided initial parameter estimates for most 
model parameters while multiple starting values for A were used in con¬ 
junction with a grid search over parameters j3^ and 5* to determine these 
initial values. The number of quadrature points implemented at each iter¬ 
ation was increased until the likelihood showed no substantial change. As 
illustrated in the simulation study, there were eleven points used in the 
adaptive quadrature routine. Standard error estimates were obtained using 
a numerical approximation to the Hessian using the {nlm} package in R. 
The coefficients of the hidden states ai and Pi are both significantly greater 
than zero, indicating that drivers operating in a poor driving state {bij = 1) 
are more likely to have a crash/near crash event and, correspondingly, a 
highly number of kinematic events. While the variance component of the 
random effect is somewhat small, the dispersion parameter is highly sig¬ 
nificant, indicating the data are overdispersed. Interestingly, heterogeneity 
is not exhibited in the CNC outcome as the coefficient for the random effect 
02 is insignificant, providing support to the notion that the hidden state is 
capturing unobserved quantities in a meaningful way. There is evidence of 
heterogeneity across individuals in their propensity to change between states 
as indicated by A and 5*. In the case of the NTDS data, 5* > 0 indicates a 
positive correlation between the transition probabilities, meaning that some 
individuals are prone to changing more often between states than others. Co¬ 
efficients in the hidden process, and <52, illustrate that transition between 
states depends on previous CNC outcomes. A prior crash was associated 
with an increased probability of transitioning from the good driving state to 
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the poor one ((5i = 1.75) and a decreased probability of transitioning from 
the poor to the good driving state {82 = 2.17). Since the shared random ef¬ 
fect, which assumes a perfect correlation between the random components, 
may not be robust to a more flexible random effects structure, we also ht 
the model using correlated random effects in the hidden process (see sim¬ 
ulation results). For a variety of starting values, the correlation coefficient 
estimates were near 1 (0.998 or greater), giving us conhdence in using the 
shared random effect approach. 

An interpretation of parameter estimates given in Table 5 is subject- 
specific and depends on a driver’s exposure for a given month. If we consider 
a subject driving the average mileage for all subjects (358.1 miles), parame¬ 
ter estimates indicate that the risk of a crash/near crash outcome increases 
from 0.16 to 0.47 when in the poor driving state, hij = 1. Correspondingly, 
this “average” subject would also expect to experience 2.43 more kinematic 
events on average when in the poor driving state. For the typical teenager, 
the likelihood of moving out of the poor driving state decreases from 10.6% 
to 1.3% when experiencing a CNC event in the previous month. Similarly, 
the likelihood of moving out of the good driving state increases from 3.01% 
to 15.2% when experiencing a CNC event in the previous month. 

A receiver operating characteristic (ROC) curve was constructed to de¬ 
termine the predictive capability of our model by plotting the true positive 
rate versus the false positive rate for different cutoff values. The ROC curve 
based on the “one-step ahead” predictions [observed outcome given all previ¬ 
ous kinematic observations Pr(yij = l\yii ,..., yij-i,Xii ,..., Xij-i)] is shown 
in Figure 1. An attractive feature of our model is that it allows for the de¬ 
velopment of a predictor based on prior kinematic events. We constructed 
this ROC curve using a cross-validation approach whereby one driver was 



Fig. 1. ROC curve for the mixed hidden Markov model based one “one-step ahead” 
predictions (area under the curve =0.74j. 
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Month Month Month 


Fig. 2. Predicted value of the hidden state given the observed data for three drivers. The 
(o) indicates the probability of being in state 1 (poor driving), (-\-) indicates a crash/near 
crash event and the dotted line indicates the composite kinematic measure. 

removed from the data set, model parameters were then determined using 
the remaining data and these results were then used to predict the removed 
driver’s crash/near crash outcomes. The predictive accuracy of this model 
was moderately high with an area under the curve of 0.74. Although the 
goodness of fit was best for the two-state mixed hidden Markov model, area 
under the curve for the other models was nearly identical. 

A sample of three drivers and their corresponding hidden state proba¬ 
bility E„.|y. xi{E(6jj|yj,Xj,Ui)} along with their crash/near crash outcomes 
and total number of kinematic events is shown in Eigure 2. It is evident 
how the total kinematic measures influence the predicted value of the hid¬ 
den state and work particularly well for cases where driving is “consistent” 
over relatively short time periods (i.e., low variation in kinematic measures 
for a given time period). In cases where the driving kinematics exhibit a 
great deal of variability, the model does not perform as well in predicting 
crash/near crash outcomes as indicated by the rightmost panel in Figure 2. 
As a comparison, we show the results of global decoding of the most likely 
hidden state sequence using the Viterbi algorithm in Figure 3 for the same 
three drivers. Hidden state classification is similar whether using global or 





d 



Fig. 3. Comparison of local and global decoding of the hidden states and CNC outcomes. 
The (o) indicates the probability of being in state 1 (poor driving), (-\-) indicates a CNC 
event and the (A) indicates the hidden state occupation using the Viterbi algorithm. 
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local decoding for the left two panels in Figure 3, which is indicative of most 
drivers in the study, while there are differences in the case of the rightmost 
panel likely due to the greater variability in these data. 

5. Discussion. In this paper we presented a mixed hidden Markov model 
for joint longitudinal binary and count outcomes introducing a shared ran¬ 
dom effect in the conditional model for the count outcomes and the model 
for the hidden process. An estimation procedure incorporating the forward- 
backward algorithm with adaptive Gaussian quadrature for numerical in¬ 
tegration is used for parameter estimation. A welcome by-product of the 
forward-backward algorithm is the hidden state probabilities for an individ¬ 
ual during any time period. The shared random effect eliminates the need 
for more costly numerical methods in approximating the likelihood, such as 
higher dimensional Gaussian quadrature or through Monte Garlo EM. 

The model was applied to the NTDS data and proved to be a good pre¬ 
dictor of crash and near crash outcomes. Our estimation procedure also 
provides a means of quantifying teenage driving risk. Using the hidden state 
probabilities which represent the probability of being in a poor driving state 
given the observed crash/near crash and kinematic outcomes, we can analyze 
the data in a richer way than standard summary statistics. Additionally, our 
approach allows for a broader class of predictors whereby the investigator 
may make predictions based on observations that go as far into the past as 
warranted. 

There are limitations to our approach. The shared random effect proposes 
a rather strong modeling assumption in order to take advantage of an appeal¬ 
ing reduction in computational complexity. Using more general correlated 
random effects approaches is an alternative, but others have found that iden¬ 
tification of the correlation parameter is difficult [Smith and Moffatt (1999) 
and Alfo, Maruotti and Trovato (2011)]. Formal testing for heterogeneity in 
these models is also a challenging problem [Altman (2008)]. 

There is also a potential issue of having treated the miles driven during 
a particular month (mjj) as exogenous. For some crashes, it is possible that 
previous CNC outcomes (yjj_i) may affect the miles driven in the following 
month and our model does not capture this dynamic. As with any study, 
greater clarity in the information obtained for each trip might yield more 
valuable insights. Metrics such as the type of road, road conditions and trip 
purpose, while potentially useful, were not available for this analysis. 

There are extensions to the model that may be useful. The model can 
address more than two outcomes. We summarize the kinematic events at 
a given time as the sum across multiple types. This approach could be ex¬ 
tended to incorporate multiple correlated processes corresponding to each 
kinematic type. Depending on the situation, the additional flexibility and 
potential benefits of such an extension may be worth the increased compu¬ 
tational cost. 
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SUPPLEMENTARY MATERIAL 

Adaptive qnadratnre for the three-state mixed hidden Markov model 

(DOI: 10.1214/14-AOAS765SUPP; .pdf). We provide details on the adaptive 
quadrature routine for the MHMM with bivariate normal random effects in 
the hidden process, as well as expressions for the three-state hidden Markov 
model. 
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